In [1]:
# To conserve interactive plots when downloading the notebook as HTML
# %%HTML
# <script src="require.js"></script>
In [2]:
import sys, os
from importlib import reload
import typing, requests, json, logging
reload(logging)
logging.basicConfig(stream=sys.stdout, level=logging.INFO, 
                    format='%(asctime)s [%(levelname)s] %(name)s - %(message)s',
                    datefmt='%Y-%m-%d %H:%M:%S',
                   )
from IPython.display import Markdown, display

import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

from datetime import datetime
import numpy as np
import plotly.express as px
from plotly.missing_ipywidgets import FigureWidget
import seaborn as sns

def save_html_plot(
    fig: FigureWidget, path: str
) -> None:
    fig.write_html(
        path,
        config={
            "displaylogo": False,
        },
        include_plotlyjs="cdn",
    )

OUTPUTS_PATH = "outputs/"


api_keys_dict = json.load(open("api_keys.json", "r"))
# print("api_keys.json contents:", api_keys_dict)
ETHERSCAN_API_KEY = api_keys_dict["etherscan"]
BITQUERY_API_KEY = api_keys_dict["bitquery"]

if ETHERSCAN_API_KEY == "1234abcdf" or BITQUERY_API_KEY == "1234abcdf":
    raise ValueError("Fill the api_keys.json file with your API keys before running this notebook!")
2022-11-02 13:34:52 [INFO] numexpr.utils - NumExpr defaulting to 8 threads.

Overview¶

Author: Dani Salgado Rojo

In this tutorial you will learn the basic notions and knowledge about the Blockchain technology as well as some practical examples about retrieving and visualizing data from a blockchain via public APIs with Python.

We will focus mostly on the Ethereum blockchain.

For any comments about the tutorial (suggestions for improvement, typos, error reports, etc), please contact me at daniel.salgado.rojo@gmail.com.

Tutorial roadmap (~ 3 h)¶

    1. Introduction to Blockchain (~ 1h 20 min, videos + summarized concepts in this notebook)
    1. Understanding Blockchain Data (~ 1h)
    1. Hands-on tutorial with Python (~ 30 min without doing the extra Exercises)
    1. Additional resources, courses, links and certifications.

Some sections include at the end some additional resources. If marked as (Optional), you may skip them and continue reading unless you need to clarify more the concepts; If marked as (Recommended), I strongly encourage you reading the resource.

I also recommend you doing the Exercises proposed at the end of Section 3.

1. Introduction to Blockchain¶

Before you continue reading this tutorial, please, start watching two particular videos from YouTube.

  • You may have heard about NFTs, Bitcoin, Cryptocurrencies and, perhaps, "Blockchain", but you may not have had interest on these topics so far. If you need to convince yourself more about why you should learn about it, and why Blockchain technologies will probably make a huge impact in our society and economy in the future, I highly recommend you watching the following Video presentation "¿Qué es la New Economy? Ponencia sobre Blockchain y Descentralización" made by the youtuber Ury Vice (Audio in Spanish, ~ 35min, October 2022).
  • The next video will provide the reader an initial Visual and Exemplified notion of what a blockchain and (its / one of its) cryptocurrencies are. This way, I expect the theory presented in this notebook will be easier to understand and consolidate: "Have you ever wondered how Bitcoin (and other cryptocurrencies) really work?" (Audio in English ~ 25 min, October 2017).

1.1 What is Blockchain¶

Blockchain (or chain of blocks) is a technology to record changes (transactions) over time in a non-destructive way that became known for its use around cryptocurrencies, but its scope goes beyond them. This technology is used like a scribe's book, where each event or modification of the data is written as a new block of a chain and in this way, there is a settled, certified record and its integrity and availability are guaranteed. If this content is also encrypted, it guarantees confidentiality. This unique and unalterable registry of data is distributed in several nodes of a decentralized network where each block of the chain stores information about itself, about valid transactions of some kind, and about its relationship with the previous and next block of the chain. The figure below shows a simplified visual representation of the blockchain transactions process. See the figure in section 1.2 to see an abstract representation of a blockchain.

blockchain process

Although, as we said, the use of blockchain is strongly associated with cryptocurrencies, this technology can be used for other types of digital assets, such as NFTs, smart contracts, tokens, etc. In addition, in recent years, blockchain-based solutions have had a very important growth, being adopted by industries such as finance to improve the security and efficiency of existing products and services, as well as by the food industry and the supply chain for the product tracking or in the scientific community for documentation management.

Going back, blockchain technology allows transactions to be recorded and an asset to be tracked within the network without the need for intermediaries (decentralization), since the thousands or millions of nodes (e.g., computer machines from different authorized persons and entities) that make up the network are the ones in charge of verifying and validating those transactions that will be recorded in a new block that is added to the chain. To accomplish this distributed agreement (validation and verification), blockchains need a consensus mechanism.

consensus

The (regular) users that want to operate on a blockchain (via transactions, for instance to trade cryptocurrencies with other users) need to have an "account" associated to that blockchain in particular. In a few words, a user is uniquely identified in the blockchain via a public key (which to be simple, would be like a "bank account address" or "your email" that anyone can see) which can be used by other users, for instance, to send some cryptocurrencies to the account (address) associated with that public key. Also, each user must have a secret key (a code like a "PIN" or "password") paired with the public key, which is what uniquely proves that you are the owner of the cryptocurrency received in the transaction. In contrast, "miners" are authorized users that provide their computational power (computers) to conform a particular type of nodes that participate in the validation of new Blocks or transactions.

There are different types of blockchain networks: there are private, public, federated and hybrid networks. Depending on the characteristics of each project or sector, each organization will choose the one that best suits its needs among the different types of blockchain. Although public blockchains, such as Bitcoin or Ethereum, are the best known, many companies have become interested in the use of private blockchains to protect sensitive information. In this sense, there are companies offering blockchain as a service (BaaS), which allows other organizations to create and use applications based on blockchain through a cloud infrastructure.

blockchain-market-map

1.2 Blockchain and cryptocurrencies¶

There are different types of cryptocurrencies. Each of them is used in a similar way to traditional money, since they serve to transfer value or as a means of exchange. While some people often refer to cryptocurrency (or "cryptocoin") as a synonym for token, there are some differences. On the one hand, a cryptocurrency is created within its own native blockchain network (e.g., the native currency of the Ethereum blockchain is the Ether (ETH)) while tokens are created using existing blockchain protocols (e.g., CryptoKitties is an example of a token-based (NFTs) game which is built on top of the Ethereum blockchain).

In the case of cryptocurrencies, we can think of blockchain as the ledger where each transaction is recorded. Its operation can be complex to understand if we delve into the internal details of its implementation, but the basic idea is easy to follow.

In each block it is stored:

  1. A number of valid records or transactions,
  2. information regarding that block,
  3. its link with the previous block and the next block through the hash (a unique id that would be like the fingerprint of the block) of each block.

Therefore, each block has a specific and immovable place within the chain, since each block contains information (hash) of the previous block. The complete chain is stored in each node of the network that makes up the blockchain, so an exact copy of the chain is stored in all the nodes that participate in the network.

As new records are created, they are first verified and validated by the nodes in the network and then added to a new block that is linked to the chain.

esquema-blockchain Illustration of a blockchain of Three blocks surrounded by what would represent the network of nodes that save a copy of the information stored in the blockchain and work in validating transactions and operations within the blockchain

To recap:

  • We can think of a blockchain as database that is updated and shared across many computers in a network.

  • "Block" refers to data and state being stored in consecutive groups known as "blocks". (e.g., in the Ethereum blockchain, if you send ETH to someone else, the transaction data needs to be added to a block to be successful).

  • "Chain" refers to the fact that each block cryptographically references its parent. In other words, blocks get chained together. The data in a block cannot change without changing all subsequent blocks, which would require the consensus of the entire network.

  • Every computer in the network must agree upon each new block and the chain as a whole. These computers are known as "nodes". Nodes ensure everyone interacting with the blockchain has the same data. To accomplish this distributed agreement, blockchains need a consensus mechanism.

From now on in this tutorial, we will focus on public blockchains which in most of the cases have a native cryptocurrency ("coin") associated with it, among many other tokens (projects build on top of the blockchain that provide new 'digital assets' which are completely different from what the "coin" or 'native asset' of the blockchain is).

1.3 The blockchain ecosystem¶

The elements that make up and contribute to the blockchain are its ecosystem. Elements that you will find in any blockchain ecosystem are:

  • Developers: The parties who build and update the blockchain network.
  • Nodes: Devices that run the blockchain's software and keep an up-to-date history of its transactions (i.e., contain a copy of the blockchain).
  • Miners/validators: Participants in the blockchain's transaction validation process (a particular type of nodes). A blockchain can have either miners or validators, depending on the validation system it uses.
  • Stakeholders/Users: Holders of the blockchain's native cryptocurrency. This includes people who have bought it as a cryptocurrency investment and who plan to use it for payments. With some blockchains, stakeholders have the right to make and vote on proposals, giving them a say in its future.

blockchain-ecosystem

Although those are the core elements of a blockchain ecosystem, they are often only the beginning. Starting with Ethereum, many modern blockchains have launched with the capability of running smart contracts. A smart contract is a self-executing blockchain contract (code), like a program that runs on a blockchain.

Smart contracts dramatically expand blockchain technology capabilities, and they introduce a new group in a blockchain ecosystem: the projects built on that blockchain. Here are just a few of the types of projects that can be created on a smart contract blockchain:

  • Decentralized finance (DeFi) platforms offer alternatives to traditional financial services, such as crypto lending programs.
  • Non-fungible tokens (NFTs) encoded in the blockchain represent ownership of a unique digital asset.
  • Blockchain video games combine elements of cryptocurrency with traditional online gaming.
  • Virtual real estate exists in a digital world and can be bought and sold.

1.4 Blockchain and the new Web 3.0¶

The internet has completely changed the game for the modern person, it changed the way we live, work and socialize and it is still shifting gears for us; it brought about a revolution that turned the world into a global village. The internet has previously experienced 2 shifts, with a 3rd on its way. Web 1.0 (Web1), Web 2.0 (Web2), and Web 3.0 (Web3) came to limelight in 1996, 2006, and 2016 respectively; each intended to be an upgrade from the previous and an upgrade it sure has been.

web evolution

  • The first shift was Web1 otherwise known as the "Wold Wide Web", the first-generation internet which gave us a read-only access to platforms, a marvel of engineering at the time. Built on HTML technology, it provided web pages for companies to put out information on websites to an audience reading up to date news any day, anytime.

    In Web1, Internet users had a fairly limited role: the consumption of information that was hosted on computer servers.

  • Web2 came to prominence with the so-called "dotcom bubble era". Web2 allowed users to modify designs and code in more detail, but it also meant the corporatization of the Internet.

    In this way, Web2 encourages interaction between the Web and users and the Internet becomes a collaborative platform in which all users can participate (blogs, social networks,...), giving them a greater voice than with Web1, but with the disadvantage that this Web is actually dominated by companies offering services in exchange for personal data. By registering to a web page or accepting cookies, we are giving personal information to the owners of those pages for free, and they sell it to other corporations which then use it, for instance, to create customized ads and marketing.

  • Thus follows Web 3, the overarching goal of this iteration of the Web is to make the internet a lot more intelligent, autonomous, and open. To make the internet akin to real-life, utilizing artificial-intelligence, secure transfer of value using blockchains, 3D graphics, virtual reality, and augmented reality.

    Web3 is intended to be a pro-privacy and anti-monopoly Web; this involves eliminating middlemen like the government or entities who have been in control of user data in Web2. Decentralizing the web will give users complete control of data and security encryption. Blockchains will be accessible to everyone, eliminating gender, geographical, sexual orientation, or socio-economic bias.

2. Understanding Blockchain data: the Ethereum Blockchain¶

In this section we will deep into learning how blockchain data looks like in the Ethereum blockchain using the Etherscan explorer. The majority of features (or at least, similar ones) in the blockchain data from Ethereum are also present in other blockchains, but we focus on the Ethereum one since it is, together with Bitcoin, one of the two most popular and currently used blockchains for real world applications.

This section will be useful for us for when we retrieve blockchain data from APIs in Section 3; to understand and easily guess, for each of the features/variables retrieved, to what they are referring to.

2.1 The Ethereum blockchain¶

Ethereum is a public blockchain with a "virtual" computer embedded in it. It is the foundation for building apps and organizations in a decentralized, permissionless, censorship-resistant way.

In the Ethereum universe, there is a single, canonical computer (called the Ethereum Virtual Machine, or EVM) whose state everyone on the Ethereum network agrees on. Everyone who participates in the Ethereum network (every Ethereum node) keeps a copy of the state of this computer. Additionally, any participant can broadcast a request for this computer to perform arbitrary computation. Whenever such a request is broadcast, other participants on the network verify, validate, and carry out ("execute") the computation. This execution causes a state change in the EVM, which is committed and propagated throughout the entire network.

Requests for computation are called transaction requests; the record of all transactions and the EVM's present state gets stored on the blockchain, which in turn is stored and agreed upon by all nodes.

Cryptographic mechanisms ensure that once transactions are verified as valid and added to the blockchain, they can't be tampered with later. The same mechanisms also ensure that all transactions are signed and executed with appropriate "permissions" (no one should be able to send digital assets from Alice's account, except for Alice herself, given that Alice is the only person in the world that knows his private key).

2.1.1 What is Ether¶

Ether (ETH) is the native cryptocurrency of Ethereum. The purpose of ETH is to allow for a market for computation. Such a market provides an economic incentive for participants to verify and execute transaction requests and provide computational resources to the network.

Any participant who broadcasts a transaction request must also offer some amount of ETH to the network as a bounty called Gas Fee. The network will award this bounty to whoever eventually does the work of verifying the transaction, executing it, committing it to the blockchain, and broadcasting it to the network.

Note: When we talk about "Ethereum" we are technically referring to a blockchain, the Ethereum blockchain. Usually people say "I have bought 'X' Ethereum" or "I have 'X' Ethereum", but they are referring to the fact that they "have bought" or "have in possession" 'X' units of ETH or Ether" in the Ethereum blockchain.

2.1.2 What are smart contracts?¶

In practice, participants don't write new code every time they want to request a computation on the EVM. Rather, application developers upload programs (reusable snippets of code) into EVM state, and users make requests to execute these code snippets with varying parameters. We call the programs uploaded to and executed by the network smart contracts.

At a very basic level, you can think of a smart contract like a sort of vending machine: a script that, when called with certain parameters, performs some actions or computation if certain conditions are satisfied. For example, a simple vendor smart contract could create and assign ownership of a digital asset if the caller sends ETH to a specific recipient. In other words, this contract would allow users to buy tokens, NFTs, and the likes build on top of Ethereum paying with ETH.

Any developer can create a smart contract and make it public to the network, using the blockchain as its data layer, for a fee paid to the network. Any user can then call the smart contract to execute its code, again for a (Gas) fee paid to the network.

Thus, with smart contracts, developers can build and deploy arbitrarily complex user-facing apps and services such as: marketplaces, financial instruments, games, etc.

2.1.3 What is an Ethereum Account? And a (crypto) Wallet?¶

An Ethereum account is an entity with an ether (ETH) balance that can send transactions on Ethereum..

Accounts can be user-controlled or deployed as smart contracts:

  1. Externally-owned account (EOA) – controlled by anyone (humans) with the private keys. Ex: if I want to buy/sell cryptocurrencies and tokens, that is the kind of account I would own.
  2. Contract account – a smart contract deployed to the network, controlled by code. EXAMPLE

Both account types have the ability to:

  • Receive, hold and send ETH and tokens
  • Interact with deployed smart contracts

[Important] A wallet is an interface for managing an (EOA) Ethereum account as well as its public and private key. Given an Ethereum account, the public key is uniquely associated with an Ethereum (account) address: if the account is an EOA this address is referred to as wallet address (or EOA address), if the account is a contract account, we refer to it as contract address or smart contract address.

Additional resources:

  • What is an Ethereum address (Complementary, Optional)
  • What is a wallet (Optional)
  • Understanding Ethereum accounts (Optional)

2.1.4 Transactions within the Ethereum Blockchain¶

An Ethereum transaction refers to an action initiated by an externally-owned account, as we already know, an account managed by a human, not a contract. For example, if Alice sends Bob 1 ETH, Alice's account must be debited and Bob's must be credited (this is an example of regular transaction). This state-changing action takes place within a transaction.

Transactions, which change the state of the EVM, need to be broadcast to the whole network. Any node can broadcast a request for a transaction to be executed on the EVM; after this happens, a validator will execute the transaction and propagate the resulting state change to the rest of the network. Transactions require a (Gas) fee and must be included in a validated block.

A submitted transaction includes the following information:

  • recipient ("to") – the receiving address (if an externally-owned account, the transaction will transfer some quantity "value" of ETH. If a contract account, the transaction will execute the contract code)
  • signature ("from") – the identifier of the sender. This is generated when the sender's private key signs the transaction and confirms the sender has authorized this transaction
  • nonce - a sequentially incrementing counter which indicates the transaction number from the account (0 if its the first transaction from that account, 1 if its the second and so on).
  • value – amount of ETH to transfer from sender to recipient (in WEI, a denomination of ETH. More details below)
  • data – optional field to include arbitrary data. Example: a message to the receiving address "Hey Bob, I send you some ETH to congratulate you for your birthday".
  • gasLimit – the maximum amount of gas units that can be consumed by the transaction. Units of gas represent computational steps
  • maxPriorityFeePerGas - the maximum amount of gas to be included as a tip to the validator
  • maxFeePerGas - the maximum amount of gas willing to be paid for the transaction (inclusive of baseFeePerGas and maxPriorityFeePerGas)

Gas is a reference to the computation required to process the transaction by a validator. It is paid in ETH but denominated in other units such as Wei and Gwei to be more measurable. Wei is used when describing smaller value transactions, and Gwei is commonly used when dealing with gas fees.

  • 10^18 Wei = 1 ETH (e.g., units for the transaction value)
  • 10^9 Gwei = 1 ETH (e.g. units for the transaction gasPrice, the price of a gas unit)

The transaction object will look a little like this:

{
  from: "0xEA674fdDe714fd979de3EdF0F56AA9716B898ec8",
  to: "0xac03bb73b6a9e108530aff4df5077c2b3d481e5a",
  gasLimit: "21000",
  maxFeePerGas: "300",
  maxPriorityFeePerGas: "10",
  nonce: "0",
  value: "10000000000"
}

[Important] On Ethereum there are a few different types of transactions:

  1. Regular transactions: a transaction from one account to another.
  2. Contract deployment transactions: a transaction without a 'to' address, where the data field is used for the contract code.
  3. Execution of a contract: a transaction that interacts with a deployed smart contract. In this case, 'to' address is the smart contract address.

The Life Cycle of a Transaction (Detailed explanations here)¶

  1. A transaction is created.
  2. The transaction creator gets a transaction hash (also known as Transaction ID or txhash) that they or anyone can use to look for the transaction on a block explorer (A website or software application that allows users to browse and analyze the entire history of a specific blockchain network.).
  3. The transaction gets broadcasted to the network and then waits for a miner to pick it up and verify it.
  4. The waiting time depends on the network's current traffic and the gas fee set for the transaction.
  5. The transaction is completed and considered as a success once it has been picked up and verified by a miner.

Additional resources:

  • Visual Demo (Recommended): to visualize the flux of a transaction where "Alice" sends some ETH to "Bob" via a regular transaction within the Ethereum network.
  • What are ethereum transactions (Optional)

2.1.4 Enterprise Ethereum use cases¶

An overview of real world applications build within the Ethereum network:

Blockchain-Process

2.2 Understanding the Ethereum blockchain with Data in the Etherscan explorer¶

Etherscan is a block explorer and analytics platform which lets you view and analyze assets, balances and transactions on the Ethereum network. You can also easily interact with smart contracts and check gas prices on Etherscan.

While it seems to share many functions of a crypto wallet, Etherscan is not a wallet service provider. It cannot store private keys, nor can it control outgoing or incoming transactions on the Ethereum blockchain. It only shows public data or on-chain data.

Additional resources:

  • Guide to etherscan I (Optional)
  • Guide to etherscan II (Optional)

2.2.1 Blocks¶

Etherscan provides with all metadata an information regarding to the blocks that are being created in the Ethereum network.

There is a page to see a table with all the blocks that are being added to the blockchain. Please, read the etherscan's documentation about the blocks page to learn about which fields are shown about a block on that page (Block#, Age, Txn, Fee Recipient, Gas Used, Gas Limit, Base Fee, Reward, Burnt Fees (ETH)).

blocks page

Then, one can see the metadata and details about a concrete block (lets say, Block #15818611 ]) in the blockchain visiting a "block details" page. Please, read the etherscan's documentation about the block details page to learn about each of the fields that define a particular block. A concrete example is shown below.

block details page

2.2.2 "Address" related pages: Wallets, Smart Contract and Tokens¶

A. Token page¶

As we mentioned earlier in section 1, a type of digital assets that are built within a blockchain are the (crypto) tokens.

These tokens have a smart contract associated with them. In the Ethereum blockchain, a Token Smart Contract is the (Ethereum) smart contract representing the mechanism of creation and distribution of the Token.

Please, read the etherscan's documentation about the token pages to learn about each of the feature shown in a Token page.

Popular Token Example

Although it is what is known as a "shit" or "meme" coin (at least when the Token project started) due to its lack of value or utility, one of the most "hyped" tokens is SHIBA INU. Below we can see how its Token page looks like:

shiba inu token page

Additional resources:

  • Exploring the NFT page in Etherscan (Optional)

B. Smart Contract Address page¶

Another (general) type of digital asset build within the blockchain are smart contracts. We have just seen a particular example of smart contract, the token smart contracts that define a token.

For instance, the smart contract address page for the SHIBA INU token is the following:

shiba inu contract address page

Smart contracts are developed in a programming language similar to JavaScript or C which is called Solidity. In the "Contract" tab, we can see the details and code of an smart contract.

Please, read the etherscan's documentation about the contract address page to learn about more details.

C. Wallet Address (EOA) page¶

Overview

  • Imagine you create an Ethereum account which you connect it with a Wallet (for instance using the Metamask plugin in Google Chrome) and you want to see the history of transactions (bought/sold tokens, token swaps/transfers,...), the balances and any other information related to your account using Etherscan. This information will be displayed in the Address Etherscan page which you can find by putting the Ethereum address associated with your account in the Etherscan search bar. To put an example, we can have a look at a/the wallet address owned by Vitalik Buterin, one of the co-founders of the Ethereum project in 2014 here.

vitalik wallet address page

  • Cryptocurrency exchanges such as Binance and Kraken also have EOA addresses in the Ethereum blockchain (multiple account addresses to operate in the Ethereum blockchain). For instance, Kraken exchange has the following accounts:
    • Kraken Exchange Wallet Addresses

[Important] After this few introduction, please read the Etherscan's documentation on EOA address pages to learn about all the data that can be accessed/visualized for a particular wallet address.

Additional resources:

  • Clustering Ethereum Addresses (Recommended)

2.2.3 Transactions¶

As we already know, transactions are the basic compounds that constitute a block in the blockchain. But how are transactions defined in terms of data (features)?

To answer this question, we will be exploring three basic examples of transactions that belong to one of the transaction types we saw in Section 2.1.4.

When viewing an address in Etherscan, there will be multiple tabs for the different type of transactions performed by that address:

etherscan transaction tabs

A. Normal transactions

Transactions where some ETH is sent "From" a wallet address (EOA) directly "To" another EOA. When viewing an address on Etherscan, this type of transaction will be shown under the Transactions tab. Click here for a sample transaction:

normal transaction

In this transaction, an anonymous wallet address sends some ETH to a wallet address owned by Etherscan to receive donations (See the "To" and "From" fields). With the terminology of Section 2.1.4, this is an example of a Regular transaction.

Another example for these type of transactions would be when you buy something to someone else and you want to pay him/her with ETH.

Blockchain-Process

B. Internal transactions

Transactions that involve a transfer of ETH that is carried out through a smart contract as an intermediary. When viewing an address on Etherscan, this type of transaction will be shown under the Internal Txns tab (in the Etherscan Address page). Click here for a sample transaction.

internal transaction

We see that the "To" feature is a Contract address, not a Wallet address as it was for a Normal transaction. Therefore, with the terminology of Section 2.1.4, this is an example of a Execution of a contract transaction.

From the Internal Txns tab (now in the Transaction details page) we can see that "From" a contract which is executed (called), 0.02415993 Ether are transferred "To" the Etherscan donations wallet address:

internal transaction tab

C. Token transfers

Transactions of ERC-20 or ERC-721 tokens are labelled as Token Transfer transactions. When viewing an address on Etherscan, this type of transaction will be shown under either the Erc20 Token Txns, Erc721 Token Txns, Erc1155 Token Txns (and so on) tabs, depending on the respective token type. Click here for a sample transaction.

token transfer transaction

We see that the "To" feature is a Contract address. Therefore, with the terminology of Section 2.1.4, this is another example of a Execution of a contract transaction, where the contract that is being executed is the contract of a Token (as we saw, the token contract is the provider of the mechanism of creation and distribution of the Token).

To get finished with transactions, please read the "What are the key areas of information covered?" section of that Etherscan's article where each of the fields that appear in the 'transaction details' page are explained. If you have time, I recommend you read the whole article.

3. Hands-on: interpretation, retrieval from APIs, processing and visualization of blockchain data with Python¶

In this section we will see how to get data from an external API that allows to retrieve data from the blockchain using Python.

The APIs that we will use are Etherscan's API, Bitquery (and its GraphQL IDE) and Coingecko.

Prerequisites

  • In order to rerun this notebook and/or doing the extra exercices found at the end of this section of the tutorial, it is necessary that you obtain your own API key from Bitquery. Its Free and you just have to register with one of your emails.
  • Similarly register and create and API key for Etherscan.
  • For now, Coingecko does not need an API key to access the free API endpoints they provide. In case it did, you should register at their page too.

Keep in mind that API Keys are private, so you should not share it with anyone.

How to create an Etherscan API key¶

  1. Create an Etherscan account
  2. Create an API key to use for this tutorial: for example, you can name it blockchain-tutorial-python.

How to obtain your own Bitquery API key¶

  1. Go to the Bitquery IDE page
  2. A popup should show up to make you login or signup.

bitquery api key popup

  1. If you already had an account just login. Otherwise click the Donot have an account? button and proceed with the registration.
  2. Once you are logged in, click the user profile icon on the top-right corner and then click the "API Key" tab. A new popup will open showing you the API key so that you can just copy it for later usage.

bitquery api key popup 2

3.1 Basic introduction to API requests¶

In order to make an API request to some API endpoint (an specific url given by the API provider) with Python, we can use the requests module.

At the same file level as this notebook, there should be located a api_keys.json file where you should replace the "1234abcdf" value by your own API keys. At the time of writting this tutorial, you don't have to fill the "coingecko" api field in that JSON since you can make API requests to Coingecko without API key.

api_keys.json

{
    "etherscan": "1234abcdf",
    "bitquery": "1234abcdf",
    "coingecko": "1234abcdf"
}

3.1.1 Etherscan¶

The first example we will see consists on retrieving the last "N" Normal transactions from a wallet or contract address

The goal is to retrieve all transactions from the Vitalik Buterin wallet only during the date 2022-10-26.

vb 26 october 2022 transactions

We see that there are 2 transactions where the VB is the "to" address, and 1 transaction where it is the "from" address of the transaction.

The transactions are ordered by date descendingly so that newest transactions are first. The transactions we want to retrieve are located between block #15828766 and block #15834718.

I have created the get_etherscan_post_request_json function to facilitate making POST requests to endpoints of the Etherscan API.

In [3]:
def get_etherscan_post_request_json(url_params_dict: typing.Dict[str, typing.Any],
                                    api_key: typing.Optional[str],
                                    verbose: bool = False) -> dict:
    """
    Function to build Etherscan API URLs and performing a POST request given a set of parameters and a valid API key.

    Example of URL:

    url_params_dict = {
        "module": "account",
        "action": "txlist",
        "address": "0xc5102fE9359FD9a28f877a67E36B0F050d81a3CC",
        "startblock": 0,
        "endblock": 99999999,
        "page": 1,
        "offset": 10,
        "sort": "asc"
    }

    https://api.etherscan.io/api
           ?module=account
           &action=txlist
           &address=0xc5102fE9359FD9a28f877a67E36B0F050d81a3CC
           &startblock=0
           &endblock=99999999
           &page=1
           &offset=10
           &sort=asc
           &apikey={api_key}
    """

    url = ""
    base_url = "https://api.etherscan.io/api"

    url += base_url

    for i, item in enumerate(url_params_dict.items()):
        key = item[0]
        value = item[1]
        if i == 0:
            url += f"?{key}={value}"
        else:
            url += f"&{key}={value}"

    url += f"&apikey={api_key}"

    if verbose:
        logging.info(f"Requested url: {url}")

    request = requests.post(url)

    if request.status_code == 200:
        return request.json()
    else:
        raise Exception(
            f"Post request failed for url '{url}'. Request status code: {request.status_code}"
        )

The dictionary of parameters to use within a call to the get_etherscan_post_request_json function to obtain the transactions at 2022-10-27 are the following:

In [4]:
url_params_dict = {
    "module": "account",
    "action": "txlist",
    "address": "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B", # Vitalik Buterin wallet address
    "startblock": 15828766, # Block to start searching transactions from
    "endblock": 15834718,  # Block to stop searching transactions at
    "page": 1,
    "offset": 10, # get at most 10 transactions (we expect to be just 3)
    "sort": "desc", # or asc
}
In [5]:
response_json = get_etherscan_post_request_json(url_params_dict, ETHERSCAN_API_KEY,True)

logging.info(f"Response JSON keys: {response_json.keys()}")
logging.info(f"status: {response_json['status']}")
logging.info(f"message: {response_json['message']}")
2022-11-02 13:34:55 [INFO] root - Requested url: https://api.etherscan.io/api?module=account&action=txlist&address=0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B&startblock=15828766&endblock=15834718&page=1&offset=10&sort=desc&apikey=MCXN2GXN8WV24D8PIN8A1264VAI2AN37J7
2022-11-02 13:34:56 [INFO] root - Response JSON keys: dict_keys(['status', 'message', 'result'])
2022-11-02 13:34:56 [INFO] root - status: 1
2022-11-02 13:34:56 [INFO] root - message: OK

Transform the list of transactions located in the "result" field of the API response JSON to a Pandas DataFrame using the json_normalize method:

In [6]:
results = response_json["result"]
results_df = pd.json_normalize(results)
logging.info(f"Dataset Features: {list(results_df.columns)}")

results_df.head()
2022-11-02 13:34:56 [INFO] root - Dataset Features: ['blockNumber', 'timeStamp', 'hash', 'nonce', 'blockHash', 'transactionIndex', 'from', 'to', 'value', 'gas', 'gasPrice', 'isError', 'txreceipt_status', 'input', 'contractAddress', 'cumulativeGasUsed', 'gasUsed', 'confirmations', 'methodId', 'functionName']
Out[6]:
blockNumber timeStamp hash nonce blockHash transactionIndex from to value gas gasPrice isError txreceipt_status input contractAddress cumulativeGasUsed gasUsed confirmations methodId functionName
0 15834718 1666817063 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e 1317 0x79ea7326cd8d180d925e398756880445f060a4eefc9fb972cfda0304c212b492 113 0x1eaf03af983d1e20b1bab73964906c281e141d4f 0xab5801a7d398351b8be11c439e05c5b3259aec9b 0 32868 17364769372 0 1 0x68747470733a2f2f747769747465722e636f6d2f56425f546f6b656e2f7374617475732f313538353337303332323436363835323836350a0a 15593267 21912 47658 0x68747470
1 15833688 1666804655 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 1315 0xcd4c6b8f9726649a85dd5bdaff6f2fc6c3774dd01e2bde3a481c5254b4e2e6bb 108 0x1eaf03af983d1e20b1bab73964906c281e141d4f 0xab5801a7d398351b8be11c439e05c5b3259aec9b 0 40572 31584089397 0 1 0x57652077696c6c2072697365207468652024564220746f6b656e20696e20796f757220686f6e6f75722c2042657374206e61727261746976652073696e636520534849420a0a4f776e65722064756d706564206f6e2074686520636f6d6d756e69747920616e6420636c6f736573207468652074656c656772616d2c62656361757365206f6620746865204c50206265656e2031303025206275726e656420616e642074686520636f6e74726163742072656e6f756e6365642c2074686520636f6d6d756e69747920726f7365206974206261636b2066726f6d2064656174682062656361757365206f66207768617420796f7520676976652075732064617920696e20616e6420646179206f75742077697468207468652045544820626c6f636b636861696e200a0a7468616e6b796f752c20796f75206861766520737570706c7920746f20646f207768617420796f75207769736820776974682e0a0a68747470733a2f2f742e6d652f5642636f6d6d756e697479455243 8676810 27048 48688 0x57652077
2 15828766 1666745363 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 209 0xaa5a81e87fa41e48fbede4ad5c8b6902c26ca720e2d4ee07107fb0e4b9918b2e 106 0xab5801a7d398351b8be11c439e05c5b3259aec9b 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 3923686000000000000 21000 12383671336 0 1 0x 14001536 21000 53610 0x

Lets see the types of the columns in the dataframe and convert to numeric variables if necessary

In [7]:
results_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 20 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   blockNumber        3 non-null      object
 1   timeStamp          3 non-null      object
 2   hash               3 non-null      object
 3   nonce              3 non-null      object
 4   blockHash          3 non-null      object
 5   transactionIndex   3 non-null      object
 6   from               3 non-null      object
 7   to                 3 non-null      object
 8   value              3 non-null      object
 9   gas                3 non-null      object
 10  gasPrice           3 non-null      object
 11  isError            3 non-null      object
 12  txreceipt_status   3 non-null      object
 13  input              3 non-null      object
 14  contractAddress    3 non-null      object
 15  cumulativeGasUsed  3 non-null      object
 16  gasUsed            3 non-null      object
 17  confirmations      3 non-null      object
 18  methodId           3 non-null      object
 19  functionName       3 non-null      object
dtypes: object(20)
memory usage: 608.0+ bytes
In [8]:
relevant_integer_columns =  ["nonce" ,"transactionIndex", "value", "gas", "gasPrice", "gasUsed", "cumulativeGasUsed", "confirmations"]
results_df.loc[:,relevant_integer_columns] = results_df.loc[:,relevant_integer_columns].astype(float) 
# we convert to float since "value" is too large for int type

Create a date (In UTC, as displayed in the Etherscan tables) column from the Timestamp column

In [9]:
results_df.loc[:, "date"] = results_df["timeStamp"].astype(int).apply(datetime.utcfromtimestamp)
main_relevant_columns = ["date", "blockNumber", "hash", "from", "to", "value", "gas", "gasUsed", "gasPrice"  ]
results_df_relevant = results_df[main_relevant_columns]
In [10]:
results_df_relevant.head()
Out[10]:
date blockNumber hash from to value gas gasUsed gasPrice
0 2022-10-26 20:44:23 15834718 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e 0x1eaf03af983d1e20b1bab73964906c281e141d4f 0xab5801a7d398351b8be11c439e05c5b3259aec9b 0.0 32868.0 21912.0 17364769372.0
1 2022-10-26 17:17:35 15833688 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 0x1eaf03af983d1e20b1bab73964906c281e141d4f 0xab5801a7d398351b8be11c439e05c5b3259aec9b 0.0 40572.0 27048.0 31584089397.0
2 2022-10-26 00:49:23 15828766 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 0xab5801a7d398351b8be11c439e05c5b3259aec9b 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 3923686000000000000.0 21000.0 21000.0 12383671336.0

To finish with this example, lets try to obtain the most similar table to the one shown in Etherscan

In [11]:
df_etherscan = results_df_relevant.replace("0xab5801a7d398351b8be11c439e05c5b3259aec9b", "Vitalik Buterin")

# Convert "value" from WEI to ETH and gasPrice from WEI to GWEI 
df_etherscan["value (ETH)"] = df_etherscan["value"] * 10**(-18)  
df_etherscan["gasPrice (GWEI)"] = df_etherscan["gasPrice"] * 10**(-9)  

# Txn Fee = Gas Used * Gas Price (and multiply by 10**(-9) since gasPrice is now in GWEI and we want Txn Fee in ETH)
df_etherscan["Txn Fee (ETH)"] = df_etherscan["gasUsed"] * df_etherscan["gasPrice (GWEI)"] * 10**(-9)
In [12]:
ETHERSCAN_TRANSACTIONS_TABLE_COLUMNS = ["hash", "date", "from", "to", "value (ETH)", "Txn Fee (ETH)", "gasPrice (GWEI)"]
df_etherscan[ETHERSCAN_TRANSACTIONS_TABLE_COLUMNS]
Out[12]:
hash date from to value (ETH) Txn Fee (ETH) gasPrice (GWEI)
0 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e 2022-10-26 20:44:23 0x1eaf03af983d1e20b1bab73964906c281e141d4f Vitalik Buterin 0.000000 0.000380 17.364769
1 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 2022-10-26 17:17:35 0x1eaf03af983d1e20b1bab73964906c281e141d4f Vitalik Buterin 0.000000 0.000854 31.584089
2 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 2022-10-26 00:49:23 Vitalik Buterin 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 3.923686 0.000260 12.383671

You can check the results by comparing it with the Etherscan table.

3.1.2 Bitquery¶

Now we will try to extract the same transactions data we just obtained using the Etherscan's API but now using another API, the Bitquery API.

Bitquery has an interactive IDE to build GraphQL-like queries that allow to retreive blockhain data.

Exercise: Go to the Bitquery IDE and try to build a query to retrieve transactions where the "To" field is the Vitalik Buterin (0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B) address and only for 2022-10-26. Similarly changing just one line of the query, you should be able to retrieve the transactions where the "From" field is the Vitalik Buterin address. At the time of writting this tutorial, I have not found a way to retreive transactions that have either the "To" field or the "From" field with a particular address as value using a SINGLE query (If you find out, please contact me ;). There is a question in the Bitquery forum asking for the same). Try to add to the query the Bitquery fields that you think will return the same values as the variables ["hash", "date", "from", "to", "value (ETH)", "Txn Fee (ETH)", "gasPrice (GWEI)"] we just saw in the last section using Etherscan.

bitquery build a query

The solution of the first query can be seen in the figure below. (Note that in the final query we are using we have included the block timestamp not the date as was shown in the figure above, since block timestamp variable provides us not only with the date but also with the time. date variable misses the time - hour, minut, seconds - info). (Note: the currency field was included just to check that what is transferred is ETH).

bitquery build a query

To sum up, the query to retrieve transactions where "To" is the Vitalik Buterin address is :

gql

query TransactionsToVB26October2022 {
  ethereum {
    transactions(
      txTo: {is: "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B"}
      options: {desc: "block.timestamp.time"}
    ) {
      gasPrice
      gasValue
      hash
      to {
        address
      }
      sender {
        address
      }
      gas
      currency {
        address
        name
        symbol
      }
      amount
      block(height: {gteq: 15828766, lteq: 15834718}) {
        height
        timestamp{
            time(format: "%Y-%m-%d %H:%M:%S")
          }
      }
    }
  }
}

and the query for transactions where "From" is the Vitalik Buterin address would be the same but with the following change:

txTo: {is: "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B"} -> txSender: {is: "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B"}

Lets create a function that generalizes the query above with some parameters:

In [13]:
def get_transactions_query(address: str = "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B",
                           start_block: int = 15828766,
                           end_block: int = 15834718,
                           transactions_to_address: bool = True) -> str:
    """
    Given an <address> (contract or wallet address), a <start_block> block number and a <end_block> block
    number. 

    Return:

    - If <transactions_to_address> is True, a query to retrieve transactions that belong to blocks with 
    block number between <start_block> and <end_block>, and so that the "To" address is <address>. 

    - If <transactions_to_address> is False, the same query as when True but so that the "From" address 
    (the sender) is <address>.

    """

    address_direction_field = "txTo" if transactions_to_address else "txSender"
    date_format = "%Y-%m-%d %H:%M:%S"

    query_str = """   
        query Transactions {
          ethereum {
            transactions(
              %s: {is: "%s"}
              options: {desc: "block.timestamp.time"}
            ) {
              gasPrice
              gasValue
              hash
              to {
                address
              }
              sender {
                address
              }
              gas
              currency {
                address
                name
                symbol
              }
              amount
              block(height: {gteq: %d, lteq: %d}) {
                height
                timestamp{
                    time(format: "%s")
                  }
              }
            }
          }
        }
    """ % (address_direction_field, address, start_block, end_block, date_format)

    return query_str

Check for the first query:

In [14]:
transactions_to_vb_query = get_transactions_query(transactions_to_address = True)
print(transactions_to_vb_query)
   
        query Transactions {
          ethereum {
            transactions(
              txTo: {is: "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B"}
              options: {desc: "block.timestamp.time"}
            ) {
              gasPrice
              gasValue
              hash
              to {
                address
              }
              sender {
                address
              }
              gas
              currency {
                address
                name
                symbol
              }
              amount
              block(height: {gteq: 15828766, lteq: 15834718}) {
                height
                timestamp{
                    time(format: "%Y-%m-%d %H:%M:%S")
                  }
              }
            }
          }
        }
    

Check for the second query:

In [15]:
transactions_from_vb_query = get_transactions_query(transactions_to_address = False)
print(transactions_from_vb_query)
   
        query Transactions {
          ethereum {
            transactions(
              txSender: {is: "0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B"}
              options: {desc: "block.timestamp.time"}
            ) {
              gasPrice
              gasValue
              hash
              to {
                address
              }
              sender {
                address
              }
              gas
              currency {
                address
                name
                symbol
              }
              amount
              block(height: {gteq: 15828766, lteq: 15834718}) {
                height
                timestamp{
                    time(format: "%Y-%m-%d %H:%M:%S")
                  }
              }
            }
          }
        }
    

Once we have the queries we want to use to retrieve data, we need a function to do a POST request to the Bitquery API. For Etherscan, we put the apikey as a parameter of the URL we were posting to, whereas now for bitquery we need to include the API key as a parameter in the Request header as you can see in the run_query function below. Also, the query string is sent as a JSON in the json parameter of the requests.post method.

In [16]:
def run_query(
    query: str, api_key: str
) -> typing.Optional[
    typing.Dict[str, typing.Any]
]:  
    """
    A simple function to use requests.post to make the API call to Bitquery API.
    """

    headers = {"X-API-KEY": api_key}
    request = requests.post(
        "https://graphql.bitquery.io/",
        json={"query": query},
        headers=headers,
    )
    if request.status_code == 200:
        return request.json()  # could be None
    else:
        return {"error": {"status_code": request.status_code}}

Call the run_query function to the two queries that retrieve the transactions we want:

In [17]:
transactions_to_vb_json = run_query(transactions_to_vb_query, BITQUERY_API_KEY)
transactions_from_vb_json = run_query(transactions_from_vb_query, BITQUERY_API_KEY)

Explore how the json responses look like:

In [18]:
transactions_to_vb_json
Out[18]:
{'data': {'ethereum': {'transactions': [{'gasPrice': 17.364768768,
     'gasValue': 0.000380496823451648,
     'hash': '0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e',
     'to': {'address': '0xab5801a7d398351b8be11c439e05c5b3259aec9b'},
     'sender': {'address': '0x1eaf03af983d1e20b1bab73964906c281e141d4f'},
     'gas': 21912.0,
     'currency': {'address': '-', 'name': 'Ether', 'symbol': 'ETH'},
     'amount': 0.0,
     'block': {'height': 15834718,
      'timestamp': {'time': '2022-10-26 20:44:23'}}},
    {'gasPrice': 31.584090112,
     'gasValue': 0.00085428644347904,
     'hash': '0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245',
     'to': {'address': '0xab5801a7d398351b8be11c439e05c5b3259aec9b'},
     'sender': {'address': '0x1eaf03af983d1e20b1bab73964906c281e141d4f'},
     'gas': 27048.0,
     'currency': {'address': '-', 'name': 'Ether', 'symbol': 'ETH'},
     'amount': 0.0,
     'block': {'height': 15833688,
      'timestamp': {'time': '2022-10-26 17:17:35'}}}]}}}

We see that the list of transactions are stored in a "third key level": "data" -> "ethereum" -> "transactions".

Retrieve the list of transactions data from the json (python dict) and join them to have the list of all transactions:

In [19]:
transactions_to_list = transactions_to_vb_json["data"]["ethereum"]["transactions"]
transactions_from_list = transactions_from_vb_json["data"]["ethereum"]["transactions"]

transactions_to_and_from = transactions_to_list + transactions_from_list
print("List of all transactions:\n")
transactions_to_and_from
List of all transactions:

Out[19]:
[{'gasPrice': 17.364768768,
  'gasValue': 0.000380496823451648,
  'hash': '0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e',
  'to': {'address': '0xab5801a7d398351b8be11c439e05c5b3259aec9b'},
  'sender': {'address': '0x1eaf03af983d1e20b1bab73964906c281e141d4f'},
  'gas': 21912.0,
  'currency': {'address': '-', 'name': 'Ether', 'symbol': 'ETH'},
  'amount': 0.0,
  'block': {'height': 15834718, 'timestamp': {'time': '2022-10-26 20:44:23'}}},
 {'gasPrice': 31.584090112,
  'gasValue': 0.00085428644347904,
  'hash': '0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245',
  'to': {'address': '0xab5801a7d398351b8be11c439e05c5b3259aec9b'},
  'sender': {'address': '0x1eaf03af983d1e20b1bab73964906c281e141d4f'},
  'gas': 27048.0,
  'currency': {'address': '-', 'name': 'Ether', 'symbol': 'ETH'},
  'amount': 0.0,
  'block': {'height': 15833688, 'timestamp': {'time': '2022-10-26 17:17:35'}}},
 {'gasPrice': 12.383671296,
  'gasValue': 0.000260057098878976,
  'hash': '0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86',
  'to': {'address': '0xc79b96044906550a5652bcf20a6ea02f139b9ae5'},
  'sender': {'address': '0xab5801a7d398351b8be11c439e05c5b3259aec9b'},
  'gas': 21000.0,
  'currency': {'address': '-', 'name': 'Ether', 'symbol': 'ETH'},
  'amount': 3.923686,
  'block': {'height': 15828766, 'timestamp': {'time': '2022-10-26 00:49:23'}}}]

Convert the list of transactions to a pandas DataFrame using the json_normalize method:

In [20]:
raw_bitquery_df = pd.json_normalize(transactions_to_and_from)
raw_bitquery_df
Out[20]:
gasPrice gasValue hash gas amount to.address sender.address currency.address currency.name currency.symbol block.height block.timestamp.time
0 17.364769 0.000380 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e 21912.0 0.000000 0xab5801a7d398351b8be11c439e05c5b3259aec9b 0x1eaf03af983d1e20b1bab73964906c281e141d4f - Ether ETH 15834718 2022-10-26 20:44:23
1 31.584090 0.000854 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 27048.0 0.000000 0xab5801a7d398351b8be11c439e05c5b3259aec9b 0x1eaf03af983d1e20b1bab73964906c281e141d4f - Ether ETH 15833688 2022-10-26 17:17:35
2 12.383671 0.000260 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 21000.0 3.923686 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 0xab5801a7d398351b8be11c439e05c5b3259aec9b - Ether ETH 15828766 2022-10-26 00:49:23

Lets now rename the columns so that we can compare with the results we obtained using the Etherscan API.

Note that:

  • gasPrice is yet in GWEI
  • gasValue is in ETH and is the feature that we called Txn Fee (ETH)
  • and "value" which is now the feature amount is yet in ETH too
In [21]:
bitquery_table_for_etherscan_comparison = raw_bitquery_df.rename(columns = {
    "gasPrice": "gasPrice (GWEI)",
    "gasValue": "Txn Fee (ETH)",
    "amount": "value (ETH)",
    "to.address": "to",
    "sender.address": "from",
    "block.height": "blockNumber",
    "block.timestamp.time": "date"
    
})[ETHERSCAN_TRANSACTIONS_TABLE_COLUMNS].replace("0xab5801a7d398351b8be11c439e05c5b3259aec9b", "Vitalik Buterin")

# Table obtained using the Bitquery API to compare with the results obtained with Etherescan
bitquery_table_for_etherscan_comparison
Out[21]:
hash date from to value (ETH) Txn Fee (ETH) gasPrice (GWEI)
0 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e 2022-10-26 20:44:23 0x1eaf03af983d1e20b1bab73964906c281e141d4f Vitalik Buterin 0.000000 0.000380 17.364769
1 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 2022-10-26 17:17:35 0x1eaf03af983d1e20b1bab73964906c281e141d4f Vitalik Buterin 0.000000 0.000854 31.584090
2 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 2022-10-26 00:49:23 Vitalik Buterin 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 3.923686 0.000260 12.383671

Comparing the results from both tables, we see that they are exactly the same except for a very small rounding error difference in the gasPrice (GWEI) column (31.584090 v.s. 31.584089 value).

In [22]:
# Table obtained before using Etherscan
df_etherscan[ETHERSCAN_TRANSACTIONS_TABLE_COLUMNS]
Out[22]:
hash date from to value (ETH) Txn Fee (ETH) gasPrice (GWEI)
0 0x158fbc1f6d8b44409c4adfe0b51bb255f61b36b00a32fa0d1bcd80cc239cf07e 2022-10-26 20:44:23 0x1eaf03af983d1e20b1bab73964906c281e141d4f Vitalik Buterin 0.000000 0.000380 17.364769
1 0xd0cdc5ee47422c606de85854acdd70a4178145570b907fc9e3d5aaf0af7d2245 2022-10-26 17:17:35 0x1eaf03af983d1e20b1bab73964906c281e141d4f Vitalik Buterin 0.000000 0.000854 31.584089
2 0x4655ccc7f011a1e6ce378c1e781fd1c26290f655564185f724cc125b41707d86 2022-10-26 00:49:23 Vitalik Buterin 0xc79b96044906550a5652bcf20a6ea02f139b9ae5 3.923686 0.000260 12.383671
In [23]:
del df_etherscan, bitquery_table_for_etherscan_comparison, raw_bitquery_df

3.1.3 Coingecko¶

Finally we are going to see how to use the Coingecko API but now in a quite different and perhaps easy way.

For some APIs there are open source providers that offer a client for specific programming languages that allow to retrieve data from the API in an specific language easier. In the case of Coingecko, there is an unofficial Python module developed by the community pycoingecko.

To retrieve information about coins in Coingecko, it is usually used a "coin id" that Coingecko has assigned to each coin or token.

Lets retrieve the list of all coins that are registered in the Coingecko API:

In [24]:
import pycoingecko 

client = pycoingecko.CoinGeckoAPI()
ALL_COINS = client.get_coins_list(include_platform=False) 
# include_platform = True gives the address of the coins in the available platforms
# Remember that Ethereum is not the only blockchain, there are many others such as Bitcoin, and some coins are not in Enthereum
# but they are in other blockchains.
In [25]:
logging.info(f"There are a total of {len(ALL_COINS)}coins (or tokens) listed in Coingecko.")
2022-11-02 13:34:58 [INFO] root - There are a total of 13255coins (or tokens) listed in Coingecko.
In [26]:
all_coins_df = pd.json_normalize(ALL_COINS)
all_coins_df.head(5)
Out[26]:
id symbol name
0 01coin zoc 01coin
1 0-5x-long-algorand-token algohalf 0.5X Long Algorand
2 0-5x-long-altcoin-index-token althalf 0.5X Long Altcoin Index
3 0-5x-long-ascendex-token-token asdhalf 0.5X Long AscendEx
4 0-5x-long-bitcoin-cash-token bchhalf 0.5X Long Bitcoin Cash

We are going to obtain data for just some of the most popular cryptocurrencies: Ethereum, Bitcoin, Cardano, Solana, ROSE (Oasis Network) and the Shiba Inu token.

In [27]:
# Subset the coins we will retrieve data about
coins_to_study_df = all_coins_df.query('id in ["cardano", "ethereum", "oasis-network", "solana", "bitcoin", "shiba-inu"]')
coins_to_study_df
Out[27]:
id symbol name
1629 bitcoin btc Bitcoin
2256 cardano ada Cardano
4214 ethereum eth Ethereum
8393 oasis-network rose Oasis Network
10351 shiba-inu shib Shiba Inu
10676 solana sol Solana
In [28]:
coins_to_study_ids = coins_to_study_df.id.tolist()
coins_to_study_ids
Out[28]:
['bitcoin', 'cardano', 'ethereum', 'oasis-network', 'shiba-inu', 'solana']

Now we are going to use a Python class that I have programmed to handle the retrieval of Historical coin related data from Coingecko, using the pycoingecko client, a coin_id, a range of dates [start_date, end_date] and a currency vs_currency to be used as unit of measurement.

In [29]:
VSCURR = typing.Literal["usd", "eur", "gbp", "jpy"] # dollar, euro, pound and japanese yen


class HistoricalCoinPriceRetriever():
    coin_id: str
    vs_currency: VSCURR
    start_date: datetime
    end_date: datetime
    all_coins_list: typing.List[typing.Any] # client.get_coins_list(include_platform=True)

    def __init__(
        self,

        coin_id: str,
        start_date: datetime,
        end_date: datetime,
        all_coins_list: typing.List[typing.Any],
        vs_currency: VSCURR = "usd",
    ):
        super().__init__()
        self.coin_id = coin_id
        self.vs_currency = vs_currency

        self.start_date = start_date
        self.end_date = end_date

        all_coins = all_coins_list
        candidates = [v for v in all_coins if v["id"] == coin_id]

        if len(candidates) == 0:
            logging.error(f"Coin ID: {self.coin_id}")
            logging.error(f"coins list len: {len(all_coins)}")
            logging.error(
                "Coin ID is not initialized well, please review the model card parameters !!!"
            )
            raise ValueError("Invalid Coin ID")
        elif len(candidates) > 1:
            logging.warning(
                f"There are {len(candidates)} coins with same id {self.coin_id}: {candidates}"
            )

        if self.end_date.timestamp() < self.start_date.timestamp():
            logging.error(f"end_date must be greater than start_date")
            logging.error(
                f"end_date: {datetime.datetime.fromtimestamp(self.end_date.timestamp,pytz.utc).isoformat()}"
            )
            logging.error(
                f"start_date: {datetime.datetime.fromtimestamp(self.start_date.timestamp,pytz.utc).isoformat()}"
            )
            raise ValueError("end_date must be greater than start_date")

    def transform(
        self
    ) -> pd.DataFrame:
        start_date = str(int(self.start_date.timestamp()))
        end_date = str(int(self.end_date.timestamp()))

        client = pycoingecko.CoinGeckoAPI()
        results_data = client.get_coin_market_chart_range_by_id(
            self.coin_id,
            self.vs_currency,
            start_date,
            end_date,  # prices, market_cap, total_volumes
        )
        prices_df = pd.DataFrame(
            results_data["prices"], columns=[
                "datetime", f"price_in_{self.vs_currency}"]
        )
        market_cap_df = pd.DataFrame(
            results_data["market_caps"],
            columns=["datetime1", f"market_cap_in_{self.vs_currency}"],
        )
        volumes_df = pd.DataFrame(
            results_data["total_volumes"],
            columns=["datetime2", f"24h_volume_in_{self.vs_currency}"],
        )
        stats_df = pd.concat([prices_df, market_cap_df, volumes_df], axis=1)
        stats_df = stats_df[
            [
                "datetime",
                f"price_in_{self.vs_currency}",
                f"market_cap_in_{self.vs_currency}",
                f"24h_volume_in_{self.vs_currency}",
            ]
        ]

        stats_df["datetime"] = pd.to_datetime(stats_df["datetime"], unit="ms")
        stats_df["coin_id"] = self.coin_id
        
        return stats_df

For this tutorial, lets retrieve data from January 1st, 2021 to October 1st, 2022 and in USD:

In [30]:
# You can change the values of the three variables below if you want to experiment with different values.
START_DATE_STR = "2021-01-01"
END_DATE_STR = "2022-10-01"
VS_CURRENCY = "usd"

# Do not change code below
START_DATETIME = datetime.strptime(START_DATE_STR, '%Y-%m-%d')
END_DATETIME = datetime.strptime(END_DATE_STR, '%Y-%m-%d')

historical_coin_info_df = None
for i, coin_id in enumerate(coins_to_study_ids):
    logging.info(f"Retrieving historical coin info for coin with id '{coin_id}'...")
    retriever = HistoricalCoinPriceRetriever(coin_id, START_DATETIME, END_DATETIME, ALL_COINS, VS_CURRENCY)
    if i == 0:
        historical_coin_info_df = retriever.transform()
    else:
        historical_coin_info_df_i = retriever.transform()
        historical_coin_info_df = pd.concat([historical_coin_info_df, historical_coin_info_df_i] , axis = 0)
2022-11-02 13:34:59 [INFO] root - Retrieving historical coin info for coin with id 'bitcoin'...
2022-11-02 13:34:59 [INFO] root - Retrieving historical coin info for coin with id 'cardano'...
2022-11-02 13:34:59 [INFO] root - Retrieving historical coin info for coin with id 'ethereum'...
2022-11-02 13:34:59 [INFO] root - Retrieving historical coin info for coin with id 'oasis-network'...
2022-11-02 13:35:00 [INFO] root - Retrieving historical coin info for coin with id 'shiba-inu'...
2022-11-02 13:35:00 [INFO] root - Retrieving historical coin info for coin with id 'solana'...
In [31]:
historical_coin_info_df.head()
Out[31]:
datetime price_in_usd market_cap_in_usd 24h_volume_in_usd coin_id
0 2021-01-01 29022.418395 5.394380e+11 4.350352e+10 bitcoin
1 2021-01-02 29352.126792 5.455933e+11 3.408972e+10 bitcoin
2 2021-01-03 32163.824935 5.978877e+11 5.727344e+10 bitcoin
3 2021-01-04 33008.226203 6.136169e+11 1.788941e+11 bitcoin
4 2021-01-05 31515.575967 5.857263e+11 7.465717e+10 bitcoin

In the following plot, you can see the coin prices in "VS_CURRENCY" units.

Since the values of different coins are at different scales, to see each of them individually, you can double click one of the coin_id values in the legend of the plot. Similarly, we show the plots for the coin market cap and the 24h volume.

In [32]:
fig = px.line(historical_coin_info_df, x="datetime", y=f"price_in_{VS_CURRENCY}", color = "coin_id", 
              title = f"Coin prices in '{VS_CURRENCY}' from {START_DATE_STR} to {END_DATE_STR}"
             )
fig.show()
# save_html_plot(fig, os.path.join(OUTPUTS_PATH, f"coin_prices_in_{VS_CURRENCY}_from_{START_DATE_STR}_to{END_DATE_STR}.html"))
In [33]:
fig = px.line(historical_coin_info_df, x="datetime", y=f"market_cap_in_{VS_CURRENCY}", color = "coin_id", 
              title = f"Market cap in '{VS_CURRENCY}' from {START_DATE_STR} to {END_DATE_STR}"
             )
fig.show()
# save_html_plot(fig, os.path.join(OUTPUTS_PATH, f"coin_marketcaps_in_{VS_CURRENCY}_from_{START_DATE_STR}_to{END_DATE_STR}.html"))
In [34]:
fig = px.line(historical_coin_info_df, x="datetime", y=f"24h_volume_in_{VS_CURRENCY}", color = "coin_id", 
              title = f"24h Volume in '{VS_CURRENCY}' from {START_DATE_STR} to {END_DATE_STR}"
             )
fig.show()
# save_html_plot(fig, os.path.join(OUTPUTS_PATH, f"coin_24hvolumes_in_{VS_CURRENCY}_from_{START_DATE_STR}_to{END_DATE_STR}.html"))

3.2 Usecases¶

Usecase I: DEX transactions¶

In this usecase example we retrieve DEX transactions data to see which are the most popular sold or bought tokens for a particular period of time. The code developed to solve this usecase is found in the src/usecases/DEX_trades module.

Background

  • DEX = Decentralized Exchange.
  • DEX == (an exchange) "Protocol"
  • What are DEX exchanges?
  • What is the Wrapped ETH?.
In [35]:
USECASE_PATH = "src/usecases/DEX_trades"
from src.usecases.DEX_trades.dex_trades_retriever import DEXTradesRetriever

display(Markdown(open(os.path.join(USECASE_PATH, "README.md"), encoding='utf-8').read()))

DEXTradesRetriever¶

Description¶

DEXTradesRetriever class allows to retrieve DEX (swap) transactions (from the ethereum network) data for a given list of protocols or for all available protocols, and between a range of dates. Additionally, an extra output DataFrame with the count of dex trades per protocol is returned in the ProtocolDexTrCounts output DataFrame.

At least, the following fields are obtained in the DEX_Trades output DataFrame:

  • transaction_from_address: usually the wallet address from which the swap transaction has been performed.
  • datetime: date of the transaction (datetime format).
  • protocol: transaction swap protocol (ex: Uniswap v3)
  • total_trade_amount_usd: amount of USD that quantifies the transaction.
  • sold_token_symbol: symbol of the token that is sold.
  • sold_tokens: number of sold tokens (ex: 20 SOLANA)
  • bought_tokens: number of bought tokens (ex: 1 WETH).
  • bought_token_symbol: symbol of the token that is bought.
  • transaction_gas_value: amount of gas (in ETH) wasted in the transaction (transaction fee).
  • bought_token_address: contract address of the bought token.
  • sold_token_address: contract address of the sold token.
  • transaction_hash: hash of the transaction.

For the ProtocolDexTrCounts output DataFrame, the columns are just two:

  • protocol: protocol name.
  • count: number of dex transactions, in the specified range of dates, that have been done via the given protocol.

Note: whether or not wallet_addresses parameter is specified, the ProtocolDexTrCounts dataframe will contain the counts over all DEX transactions, not only transactions for the specified address, for the specified date range.

Tip: To convert the column transaction_gas_value from ETH to USD in order to know the value in USD of the gas at the time of the transactions, it is necessary to know the price of ETH in USD at the time of the transaction.

Warning: When the protocols parameter is not specified (i.e., left empty), all available protocols at the time of running the transform method of the retriever will be considered. For this case, keep in mind that the running time can be huge, although specifying just a cuple of days via the start_date and end_date parameters.

Class Parameters¶

  • api_key: Your Bitquery API key. Type: string.
  • protocols: Optional. List of DEX swap protocols to be considered when retrieving transactions. If not specified, all available protocols at the time of running the button will be considered. Type: list of strings.
  • start_date: Initial date to start retrieving dex transactions from (the specified date is included). Type: datetime.datetime object.
  • end_date: Final date to stop retrieving dex transactions (the specified date is included). Type: datetime.datetime object.
  • wallet_addresses: Optional. List of wallet addresses to retrieve DEX transactions from. Type: list of strings.

DEX trades during Halloween 2022 "holidays"¶

In [36]:
# You can change the values of the three variables below if you want to experiment with different values.
START_DATE_STR = "2022-10-31"
END_DATE_STR = "2022-11-01"

# Do not change code below
START_DATETIME = datetime.strptime(START_DATE_STR, '%Y-%m-%d')
END_DATETIME = datetime.strptime(END_DATE_STR, '%Y-%m-%d')

trades_retriever = DEXTradesRetriever(api_key=BITQUERY_API_KEY, start_date=START_DATETIME, end_date=END_DATETIME)
In [37]:
usecase_outputs_path = os.path.join(OUTPUTS_PATH, "DEX_trades")
dex_trades_file = f"dex_trades_{START_DATE_STR}_{END_DATE_STR}.csv"
protocols_file = f"protocols_count_{START_DATE_STR}_{END_DATE_STR}.csv"

dex_trades_path = f"{usecase_outputs_path}/{dex_trades_file}"
protocols_path =  f"{usecase_outputs_path}/{protocols_file}"
In [38]:
if os.path.exists(dex_trades_path) and os.path.exists(protocols_path):
    logging.warn(f"Existing data for the time range [{START_DATETIME},{END_DATETIME}]. Reading DEX trades and protocols dataframes from file.")
    protocols_df = pd.read_csv(protocols_path)
    trades_df =  pd.read_csv(dex_trades_path)
else:
    halloween_trades_dict = trades_retriever.transform() # ~45 min running time for 5 days. ~ 15min for 2 days
    protocols_df = halloween_trades_dict["ProtocolDexTrCounts"]
    trades_df = halloween_trades_dict["DEX_Trades"]
                                      
    protocols_df.to_csv(protocols_path, index = False)
    trades_df.to_csv(dex_trades_path, index = False)
2022-11-02 13:35:47 [WARNING] root - Existing data for the time range [2022-10-31 00:00:00,2022-11-01 00:00:00]. Reading DEX trades and protocols dataframes from file.
C:\Users\daniel.salgado\AppData\Local\Temp\ipykernel_22932\836146019.py:2: DeprecationWarning:

The 'warn' function is deprecated, use 'warning' instead

In [39]:
trades_df.shape
Out[39]:
(332261, 12)
Protocol (DEX) popularity for trading¶
In [40]:
protocos_df = protocols_df.sort_values(by = "count", ascending = False)
In [41]:
# Top 10 protocols
protocos_df.head(10)
Out[41]:
protocol count
30 Uniswap v2 256214
31 Uniswap v3 88975
4 Balancer v2 4096
33 Zerox Exchange v2 1453
35 Zerox Exchange v4 1387
3 Balancer Pool Token 986
6 Bancor Network v2 858
8 Curve 359
12 Dodo 299
29 Uniswap 197
In [42]:
sns.barplot(data=protocos_df.head(10), x = "count", y="protocol");
Top 15 most sold tokens¶
In [43]:
sold_tokens_counts = trades_df["sold_token_symbol"].value_counts().reset_index(name="counts").rename(columns={"index": "token_symbol"})
sold_tokens_counts.head(15)
Out[43]:
token_symbol counts
0 WETH 125815
1 USDC 28865
2 USDT 12461
3 DAI 3911
4 Vine 3868
5 SHIB 2433
6 WBTC 2419
7 LMEOW 2296
8 USHI 2104
9 BUSHI 1967
10 Error in symbol 1913
11 TITTER 1810
12 TAMA 1702
13 HEX 1553
14 $TRUMP 1540
In [44]:
sns.barplot(data=sold_tokens_counts.head(15), x = "counts", y="token_symbol");
Top 15 most bought tokens¶
In [45]:
bought_tokens_counts = trades_df["bought_token_symbol"].value_counts().reset_index(name="counts").rename(columns={"index": "token_symbol"})
bought_tokens_counts.head(15)
Out[45]:
token_symbol counts
0 WETH 163475
1 USDC 28853
2 USDT 13159
3 DAI 4178
4 Vine 2619
5 WBTC 2486
6 SHIB 1969
7 LMEOW 1494
8 HEX 1382
9 $TRUMP 1325
10 Error in symbol 1307
11 DC 1200
12 TITTER 1175
13 TAMA 1161
14 XEN 978
In [46]:
sns.barplot(data=bought_tokens_counts.head(15), x = "counts", y="token_symbol");

Vitalik Buterin DEX trades in 2022¶

In [47]:
VITALIK_BUTERIN_ADDRESS = "0xab5801a7d398351b8be11c439e05c5b3259aec9b"

# You can change the values of the three variables below if you want to experiment with different values.
START_DATE_STR = "2022-01-01"
END_DATE_STR = "2022-12-31"

# Do not change code below
START_DATETIME = datetime.strptime(START_DATE_STR, '%Y-%m-%d')
END_DATETIME = datetime.strptime(END_DATE_STR, '%Y-%m-%d')
WALLET_ADDRESSES = [VITALIK_BUTERIN_ADDRESS]

vb_trades_retriever = DEXTradesRetriever(api_key=BITQUERY_API_KEY, start_date=START_DATETIME, end_date=END_DATETIME,
                                      wallet_addresses=WALLET_ADDRESSES)
In [48]:
vb_df_dict = vb_trades_retriever.transform()
2022-11-02 13:35:50 [INFO] root - Retrieving number of DEX transactions per protocol data...
2022-11-02 13:35:51 [INFO] root - Retrieving DEX transactions all available protocols, from 2022-01-01 to 2022-12-31...
2022-11-02 13:35:51 [WARNING] root -         - Waiting for 5.5 seconds until next request to the Bitquery Api due to free plan limits.
2022-11-02 13:35:58 [WARNING] root -         - Waiting for 4.6 seconds until next request to the Bitquery Api due to free plan limits.
2022-11-02 13:36:05 [INFO] root -     Finished in 2 iterations.
In [49]:
vb_df_dict.keys()
Out[49]:
dict_keys(['DEX_Trades', 'ProtocolDexTrCounts'])
In [50]:
vb_dex_trades = vb_df_dict["DEX_Trades"]
vb_dex_trades_by_protocol = vb_df_dict["ProtocolDexTrCounts"]
In [51]:
vb_dex_trades.head()
Out[51]:
transaction_from_address datetime protocol total_trade_amount_usd sold_token_symbol sold_tokens bought_tokens bought_token_symbol transaction_gas_value bought_token_address sold_token_address transaction_hash
0 0x24f8e4f5188f63a882e453d10636a0112fd6c7e1 2022-01-04 14:09:18 Uniswap v2 76962.197266 FOMO 1.482848e+12 20.000 WETH 0.012183 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2 0xb68f32410a7dd4cf7a1ae18c6b6ddefa2eed80b3 0xdea19cb511080b06ec705105abb168609066d2a29a2a8e5082a6d4d6db758ea8
1 0xf5c0f8388d32ee34deb0e34fa9eff279f54448c4 2022-01-06 21:20:25 Uniswap v2 13721.250977 SHIBA DAO 6.922182e+10 4.000 WETH 0.015196 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2 0xf1656ae9e8227da5ebb93406e2edd74d9820d0ad 0xe4a50708b27563b031d2f16b2535c6f08e22bb9a7cbec171d2b90a4e5599a9fa
2 0x6081bb069380cd45753b21857b259a0be76f23e7 2022-01-08 06:26:52 Uniswap v2 4826.196533 RON 5.095918e+09 1.500 WETH 0.009534 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2 0xebfc9d08f47b6b4411eaf5b97f1fd7d9b6d2929c 0xe3e93539f72e1e6695b6fc39b4bdd99e587cc768b24bfd3464ba7e4023105052
3 0x7699833498006a7a8f42835a842e124c8f9a3ae3 2022-01-10 05:05:07 Uniswap v2 5949.725798 WEB3.0 1.253301e+10 1.873 WETH 0.011379 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2 0xd1bfb50bcb96635cf13fe50ba507ebbf37b09234 0xbc74997dccb810a6819fa9fe3cef44b8f1b73b96cad0d3a5903f7ffe54589d4f
4 0x012c192653fefdf2b4626011ae8cecab40c2ed86 2022-01-12 11:59:53 Uniswap v2 27681.426636 LOVE 6.738590e+10 8.500 WETH 0.014988 0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2 0x2efc83c953f7a6a059744270c358edd6162f89d7 0x4ca1f1184e5150992f463107378c510f22245921ecd76d60d12f56be9a6e4689
Most sold tokens¶
In [52]:
sold_tokens_counts = vb_dex_trades["sold_token_symbol"].value_counts().reset_index(name="counts").rename(columns={"index": "token_symbol"})
sold_tokens_counts.head(10)
Out[52]:
token_symbol counts
0 SHIB 15
1 GAL 2
2 USDC 2
3 Ape DAO 2
4 UkraineDAO 2
5 $NOTHING 1
6 GOAL 1
7 ROPs 1
8 VYPER 1
9 DETH 1
In [53]:
sns.barplot(data=sold_tokens_counts.query('counts > 1'), x = "counts", y="token_symbol");
Most bought tokens¶
In [54]:
bought_tokens_counts = vb_dex_trades["bought_token_symbol"].value_counts().reset_index(name="counts").rename(columns={"index": "token_symbol"})
bought_tokens_counts.head(10)
Out[54]:
token_symbol counts
0 WETH 61
In [55]:
sns.barplot(data=bought_tokens_counts.query('counts > 1'), x = "counts", y="token_symbol");

3.3 Exercises¶

3.3.1 (Exercises) Basic Introduction to API requests¶

Basic Exercise I: Transactions timeseries for a particular Wallet address and Time range. (Beginner)¶

The goal of this exercise is to obtain, using the Bitquery API, all Normal transactions where given address is participating (either as 'To' or 'From' address) and for anarbitrary time range [start_date, end_date].

In section 3.1.2 we did it using the start block and end block numbers, which gave the particular case of transactions for the date 2022-10-26. The goal now is to use dates instead of block numbers, and generalize to a range of dates.

  1. You should write a function similar to the get_transactions_query function, now using two new parameters start_date and end_date, not block numbers. Tip: Use the Bitquery IDE to build the query and look for some "date" filter field.

  2. Retrieve all transactions from January 1st 2022 to October 1st 2022 for the Vitalik Buterin address (0xAb5801a7D398351b8bE11C439e05C5B3259aeC9B). Create the following plots:

    • A count plot showing how many transactions have the Vitalik Buterin address as receiver and how many as sender.
    • A line plot to show how many transactions per day there are (You could add three lines in the same plot: one taking into account all transactions, another for only "sent" transactions and another for only "received" transactions). You may use plotly express to obtain an interactive plot (px.line).

Extra: Try the same exercise but using datetime ranges (i.e., include Hour,Minut, Second precision) instead of just a raw date (i.e., year-month-date).

In [56]:
# Code your function below
In [57]:
# Check your function for transactions where the 'To' address is the input address

# <Replace this with your code>
In [58]:
# Check your function for transactions where the 'From' address is the input address

# <Replace this with your code>
In [59]:
# Retrieve transactions and store as pandas dataframe
In [60]:
# Preprocess and prepare the dataframes for visualizations
In [61]:
# Count plot
In [62]:
# Line plot
In [ ]:
 

Basic Exercise II: Historical coin info using Coingecko API without a client (Beginner)¶

The goal of this exercise is to reproduce Section 3.1.3 where we used the pycoingecko Python client as a handler of the Coingecko API to retrieve data, but now do the same from scratch, using one or more endpoints from the Coingecko API.

  1. Explore the endpoints of the Coingecko API Coingecko API and see which of them (It may be just one endpoint) allows you to retrieve the necessary data to reproduce Section 3.1.3.

  2. Retrieve data and build the corresponding dataframe.

  3. Make the historical coin price, market cap and 24h volum visualizations in DOLLAR and EURO.

Tip: You may need to use a similar function to make requests to the Coingecko API as we did when using Etherscan API in section 3.1.1 but now without needing an API key.

Extra: You may explore more historical coin related data features and make additional plots for them.

In [63]:
# Create cells and add code as needed
In [ ]:
 

3.3.2 (Exercises) Usecases¶

Usecase Exercise I: Last Week Most Traded Tokens in DEX (Intermediate)¶

The goal of this usecase exercise is to obtain statistics of which Tokens/Coins have been traded more frequently (sold and/or bought) in the last week (or an arbitrary week starting from a monday to a sunday) and in DEX (Decentralized Exchanges).

This would provide insights of which tokens/coins are currently being bought or sold by the community. Trades consist on interchanging one cryptocurrency by another. For example, I could trade (sell) 1 ETH for an equivalent quantity of "BTC" (buy), where the quantity bought would be calculated using the current price of BTC and accounting also for the Fees payed to do the transactions involved in that trade.

Exercise: Use the Bitquery API to:

  • Obtain a ranking of the top N (e.g. N = 10) most traded crypto in DEX for a given week.
  • Visualize using bar plots and count plots to separate whether the tokens studied are sold or bought.
  • Make any other visualizations or analysis to give insights about the DEX trades performed during that week.

Tips:

  • Fix a concrete week to work with (a start and end date).
  • You may need to perform a loop with multiple queries, separating in time ranges, since the free Bitquery API plan has limited requests per minute and limited "computational power" per query.
  • Some information that would be useful: sold currency, bought currency, date, maker, taker, ... Explore and investigate the available features for DEX trades.
In [64]:
# Create cells and add code as needed
In [ ]:
 

Usecase Exercise II: Ethereum Biggest Moves (Advanced)¶

The goal of this usecase exercise is to totally or partially reproduce the dashboard shown in this Dune-app Dashboard to shows an overview of the 'Biggest Moves' in the Ethereum ecosystem in a particular timeframe (default 1 day, top 10)

  • Largest DEX Trades
  • Largest NFT Sales
  • Highest ETH Fee Paid
  • Largest ERC20 Transfers
  • Largest ETH amounts moved

Exercise: Investagate how to obtain at least one of the tables listed above (for example, Largest ETH amounts moved or Largest DEX Trades) using some public API (Bitquery, Etherscan,...).

In [65]:
# Create cells and add code as needed
In [ ]:
 

4. Additional resources and links¶

Blockchain APIs¶

  • Bitquery (Requires registration to get API key)
  • Coingecko
  • Coinmarketcap (Requires registration to get API key)
  • Coinbase (Requires a Coinbase account)
  • Etherscan (Requires registration to get a quite limited free plan API key)
  • NOWNodes (Requires registration to get API key)
  • Moralis (Requires registration to get API key)

Links¶

  • AI and Blockchain

  • How to choose the best blockchain api for your project

Courses & Tutorials¶

  • (Beginner) Python - The Practical Guide [2022 Edition]: Students can dive into Web Development or Data Science with Python; Students may build upon the Python and Blockchain knowledge gained in this course and start working on their own Blockchain. (~18h duration).
  • (Intermediate) Coursera Blockchain Specialization (4 Courses). It is ideal for programmers and designers involved in developing and implementing blockchain applications, and anyone who is interested in understanding its potential. Prerequisites: Knowledge of at least one modern, high-level programming language is required.
  • (Intermediate) Ethereum and Solidity: The Complete Developer's Guide. Use Ethereum, Solidity, and Smart Contracts to build production-ready apps based on the blockchain. Prerequisites: Basic Knowledge of Javascript and NPM. A Mac, PC, or Linux Machine.
  • (Beginner to Advanced) Cryptozombies: a series of interactive tutorials to learn about blockchain applications development with Solidity. I recommend taking at least the lesson/subcourse called Making the Zombie Factory, which is quite fast to do and gives a sense of what solidity is.

Web-Apps¶

  • Dune: web-app to create private dashboards containing visualizations that are created from data retrieved from the blockchains.
  • ETH.Build: An Educational Sandbox For Web3... And Much More.

Certifications¶

  • Blockchain Expert Certification. CEBP Certification for Senior Leadership & Decision Makers, Product & Program Managers, Innovation Managers & Entrepeneurs, Advisors & Business Analysts.
  • Certified Enterprise Blockchain Architect. CEBA Certification for Software Architects & Engineers, Software Developers & IT Managers, Enterprise & Cloud Architects, IT Sales Executives & Pre Sales Engineers. As a Data Scientist & Engineer I would start with this one.
  • Certified Blockchain Security Expert. CBSE Certification for Software Architects & Engineers, Software Developers & IT Managers, Enterprise & Cloud Architects, CIOs & IT Managers, Blockchain Developers.

References¶

  • Ethereum official website.
  • Etherscan official website, Etherscan learning resources.
  • The Evolution of Web 1.0 to Web 3.0.
  • From Web1 to Web3: the Web’s re-evolution.

5. Exercise solutions¶

For now, no solutions will be provided

In [ ]: